fix(runtime): restore sys.modules['__main__'] after run-mode kernel#9313
fix(runtime): restore sys.modules['__main__'] after run-mode kernel#9313jacobcbeaudin wants to merge 2 commits intomarimo-team:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
d418104 to
b28aff4
Compare
b28aff4 to
05db250
Compare
akshayka
left a comment
There was a problem hiding this comment.
Hey @jacobcbeaudin, the CI issues are resolved, so we (marimo team) no longer need this fix urgently.
Can you help me understand your motivation for attempting to fix this? Are you hitting this bug in real usage of marimo, or in tests?
If you are hitting it in real usage I will consider fixing it myself. This is not an easy area of marimo to contribute to.
|
@akshayka thanks for the review, and understood on the complexity of this area. Motivation for the fix is on the issue: #9293 (comment). Given #9304 is in flight and your point about this being a delicate area, closing this PR. If useful after #9304 lands, happy to contribute the two orthogonal pieces (end-to-end spawn-after-cycle test and the bounded |
What
Closes #9293.
Run-mode (thread) kernels share
sys.moduleswith the host process.patches.patch_main_modulereplacessys.modules['__main__']with a synthetic module and has no restore path, so after aSessionMode.RUNsession ends, the host is left with the synthetic main. Any later code that readssys.modules['__main__'](notablymultiprocessing.spawn.get_preparation_dataduring subprocess bootstrap, andForkingPicklerduring target pickling) gets the wrong module. Three existing test-layer workarounds in this repo (test_asgi.pysetUp/tearDown,test_session_manager.py::_preserve_main_module,conftest.py::_save_and_restore_mainfrom #9286) currently paper over the leak.How
marimo/_runtime/patches.py: addssave_main_module/restore_main_module, refcounted with athreading.Lock. First save captures the host's original__main__; last release restores it. Unbalanced save leaks the refcount for the process lifetime; unbalanced restore is a no-op. Debug-level logs on capture and restore so forensic traces can locate the save/restore points on a polluted host.marimo/_session/managers/kernel.py:KernelManagerImpl.__init__initializes a per-instance flag andthreading.Lockthat together make the wrappers idempotent against doubleclose_kerneland safe under concurrent callers._save_host_main_module,_restore_host_main_module,_stop_and_join_run_mode_kernelencapsulate the run-mode bookkeeping.start_kernel/close_kernelbecome short call sites that delegate to these methods.close_kernelnow bounds the kernel thread join at 5 seconds; on timeout it logs a warning and skips the__main__restore rather than touching process state under a live cell.start_kernelexception path balances the save if thread setup fails.tests/_runtime/test_patches.py: 6 unit tests covering the refcount primitive (basic cycle, overlapping savers, over-release no-op, concurrent thread safety, absent__main__edge case, end-to-end host spawn after save/patch/restore).tests/_runtime/_patches_spawn_target.py: tiny helper module hosting a top-levelnoop_targetfunction used by the end-to-end spawn test.tests/_session/managers/test_kernel.py: 7 unit tests covering the application-layer wrappers (flag transition, idempotent save, save-without-restore leak contract, idempotent restore, concurrent-restore safety, restore-without-save no-op, full cycle integration).Scope
patch_main_modulecall site is unchanged (WASM environment with no persistent host to pollute).close_kernelis unchanged (subprocess is terminated; host state restore is not meaningful).Existing test-layer workarounds (
test_asgi.py,test_session_manager.py::_preserve_main_module,conftest.py::_save_and_restore_mainfrom #9286) are left in place; they can be retired in a follow-up once this lands.Testing
Run on Linux (
python:3.12.3-slim,uv sync --group dev --group test):tests/_runtime/test_patches.py: all pass (11 pre-existing + 6 new; 1 skipped for polars).tests/_session/managers/test_kernel.py: 7 new tests pass.tests/_session/managers/: 23 pre-existing tests still pass.tests/_messaging/test_thread_local_proxy.py: 21 tests pass.tests/_server/export/test_exporter.py::test_run_until_completion_with_stop,::test_run_until_completion_with_console_output: pass.Not run: full
tests/suite,tests/_islands/,tests/_server/test_asgi.py,tests/_server/test_session_manager.py, frontend-dependent HTML export tests (needmarimo/_static/index.htmlwhich is not produced by plainuv sync).Notes for reviewers
Refcount pattern mirrors
marimo/_messaging/thread_local_streams.pyfor consistency; I kept the module-level globals style rather than introducing a class wrapper.Alternative implementations considered and rejected:
launch_kernel's body (~290-line indent change; LIFO only).patch_main_module_contextaround the kernel main loop (same indent cost; LIFO only).The refcount approach handles overlapping server sessions correctly, which the existing TODO inside
patch_main_modulealready flags as a concern.Happy to adjust naming, split tests, drop defensive ones, or reshape structure on review.